AITopics | adapter network

Collaborating Authors

adapter network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Gradient-free Continual Learning

Rypeść, Grzegorz

arXiv.org Artificial IntelligenceApr-1-2025

Continual learning (CL) presents a fundamental challenge in training neural networks on sequential tasks without experiencing catastrophic forgetting. Traditionally, the dominant approach in CL has been gradient-based optimization, where updates to the network parameters are performed using stochastic gradient descent (SGD) or its variants. However, a major limitation arises when previous data is no longer accessible, as is often assumed in CL settings. In such cases, there is no gradient information available for past data, leading to uncontrolled parameter changes and consequently severe forgetting of previously learned tasks. By shifting focus from data availability to gradient availability, this work opens up new avenues for addressing forgetting in CL. We explore the hypothesis that gradient-free optimization methods can provide a robust alternative to conventional gradient-based continual learning approaches. We discuss the theoretical underpinnings of such method, analyze their potential advantages and limitations, and present empirical evidence supporting their effectiveness. By reconsidering the fundamental cause of forgetting, this work aims to contribute a fresh perspective to the field of continual learning and inspire novel research directions.

artificial intelligence, learning, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2504.01219

Country:

North America > Canada > Ontario > Toronto (0.15)
Europe > Poland > Masovia Province > Warsaw (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Cross-Problem Learning for Solving Vehicle Routing Problems

Lin, Zhuoyi, Wu, Yaoxin, Zhou, Bangjian, Cao, Zhiguang, Song, Wen, Zhang, Yingqian, Jayavelu, Senthilnath

arXiv.org Artificial IntelligenceJun-18-2024

Among the studied COPs, the Vehicle Routing Problems (VRPs) are often favoured and chosen to verify the effectiveness Existing neural heuristics often train a deep architecture of the NCO methods, especially the Traveling from scratch for each specific vehicle Salesman Problem (TSP) and Capacitated Vehicle Routing routing problem (VRP), ignoring the transferable Problem (CVRP). On the one hand, VRPs are widely applied knowledge across different VRP variants. This paper in real-world scenarios such as logistics, and drone proposes the cross-problem learning to assist delivery [Wang and Sheu, 2019; Konstantakopoulos et al., heuristics training for different downstream VRP 2022]. On the other hand, VRPs are known to be NPcomplete variants. Particularly, we modularize neural architectures problems, and many of them are challenging to be for complex VRPs into 1) the backbone solved efficiently. With the advances of deep learning and its Transformer for tackling the travelling salesman power to automatically learn neural heuristics, NCO methods problem (TSP), and 2) the additional lightweight have demonstrated notable promise against traditional heuristics modules for processing problem-specific features [Kool et al., 2018; Kwon et al., 2020; Li et al., 2021; Luo in complex VRPs. Accordingly, we propose to pretrain et al., 2023]. To further strengthen NCO methods, a number the backbone Transformer for TSP, and then of recent endeavors have been paid to enhance generalization apply it in the process of fine-tuning the Transformer capabilities, which attempt to ameliorate the performance of models for each target VRP variant. On the the neural heuristics in solving the VRP instances with distributions one hand, we fully fine-tune the trained backbone or sizes unseen during training [Geisler et al., 2022; Transformer and problem-specific modules simultaneously.

neural heuristic, transformer, vrp, (14 more...)

arXiv.org Artificial Intelligence

2404.11677

Country:

Asia > Singapore (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)

Genre: Research Report (1.00)

Industry: Transportation > Freight & Logistics Services (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

TRAWL: External Knowledge-Enhanced Recommendation with LLM Assistance

Luo, Weiqing, Song, Chonggang, Yi, Lingling, Cheng, Gong

arXiv.org Artificial IntelligenceMay-24-2024

Combining semantic information with behavioral data is a crucial research area in recommender systems. A promising approach involves leveraging external knowledge to enrich behavioral-based recommender systems with abundant semantic information. However, this approach faces two primary challenges: denoising raw external knowledge and adapting semantic representations. To address these challenges, we propose an External Knowledge-Enhanced Recommendation method with LLM Assistance (TRAWL). This method utilizes large language models (LLMs) to extract relevant recommendation knowledge from raw external data and employs a contrastive learning strategy for adapter training. Experiments on public datasets and real-world online recommender systems validate the effectiveness of our approach.

external knowledge, information, knowledge, (15 more...)

arXiv.org Artificial Intelligence

2403.06642

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (1.00)

Industry: Media > Film (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

Liu, Fangcheng, Tang, Yehui, Liu, Zhenhua, Ni, Yunsheng, Han, Kai, Wang, Yunhe

arXiv.org Artificial IntelligenceApr-29-2024

Speculative decoding has demonstrated its effectiveness in accelerating the inference of large language models while maintaining a consistent sampling distribution. However, the conventional approach of training a separate draft model to achieve a satisfactory token acceptance rate can be costly. Drawing inspiration from early exiting, we propose a novel self-speculative decoding framework \emph{Kangaroo}, which uses a fixed shallow sub-network as a self-draft model, with the remaining layers serving as the larger target model. We train a lightweight and efficient adapter module on top of the sub-network to bridge the gap between the sub-network and the full model's representation ability. It is noteworthy that the inference latency of the self-draft model may no longer be negligible compared to the large model, necessitating strategies to increase the token acceptance rate while minimizing the drafting steps of the small model. To address this challenge, we introduce an additional early exiting mechanism for generating draft tokens. Specifically, we halt the small model's subsequent prediction during the drafting phase once the confidence level for the current token falls below a certain threshold. Extensive experiments on the Spec-Bench demonstrate the effectiveness of Kangaroo. Under single-sequence verification, Kangaroo achieves speedups up to $1.68\times$ on Spec-Bench, outperforming Medusa-1 with 88.7\% fewer additional parameters (67M compared to 591M). The code for Kangaroo is available at https://github.com/Equationliu/Kangaroo.

arxiv preprint arxiv, kangaroo, token acceptance rate, (12 more...)

arXiv.org Artificial Intelligence

2404.18911

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)

Add feedback

FedSIS: Federated Split Learning with Intermediate Representation Sampling for Privacy-preserving Generalized Face Presentation Attack Detection

Alkhunaizi, Naif, Srivatsan, Koushik, Almalik, Faris, Almakky, Ibrahim, Nandakumar, Karthik

arXiv.org Artificial IntelligenceAug-22-2023

Lack of generalization to unseen domains/attacks is the Achilles heel of most face presentation attack detection (FacePAD) algorithms. Existing attempts to enhance the generalizability of FacePAD solutions assume that data from multiple source domains are available with a single entity to enable centralized training. In practice, data from different source domains may be collected by diverse entities, who are often unable to share their data due to legal and privacy constraints. While collaborative learning paradigms such as federated learning (FL) can overcome this problem, standard FL methods are ill-suited for domain generalization because they struggle to surmount the twin challenges of handling non-iid client data distributions during training and generalizing to unseen domains during inference. In this work, a novel framework called Federated Split learning with Intermediate representation Sampling (FedSIS) is introduced for privacy-preserving domain generalization. In FedSIS, a hybrid Vision Transformer (ViT) architecture is learned using a combination of FL and split learning to achieve robustness against statistical heterogeneity in the client data distributions without any sharing of raw data (thereby preserving privacy). To further improve generalization to unseen domains, a novel feature augmentation strategy called intermediate representation sampling is employed, and discriminative information from intermediate blocks of a ViT is distilled using a shared adapter network. The FedSIS approach has been evaluated on two well-known benchmarks for cross-domain FacePAD to demonstrate that it is possible to achieve state-of-the-art generalization performance without data sharing. Code: https://github.com/Naiftt/FedSIS

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2308.10236

Country:

Europe > Finland > Northern Ostrobothnia > Oulu (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Europe > Switzerland > Geneva > Geneva (0.04)
(6 more...)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
(5 more...)

Add feedback

Linear Representation Meta-Reinforcement Learning for Instant Adaptation

Peng, Matt, Zhu, Banghua, Jiao, Jiantao

arXiv.org Machine LearningJan-12-2021

This paper introduces Fast Linearized Adaptive Policy (FLAP), a new meta-reinforcement learning (meta-RL) method that is able to extrapolate well to out-of-distribution tasks without the need to reuse data from training, and adapt almost instantaneously with the need of only a few samples during testing. FLAP builds upon the idea of learning a shared linear representation of the policy so that when adapting to a new task, it suffices to predict a set of linear weights. A separate adapter network is trained simultaneously with the policy such that during adaptation, we can directly use the adapter network to predict these linear weights instead of updating a meta-policy via gradient descent, such as in prior meta-RL methods like MAML, to obtain the new policy. The application of the separate feed-forward network not only speeds up the adaptation run-time significantly, but also generalizes extremely well to very different tasks that prior Meta-RL methods fail to generalize to. Experiments on standard continuous-control meta-RL benchmarks show FLAP presenting significantly stronger performance on out-of-distribution tasks with up to double the average return and up to 8X faster adaptation run-time speeds when compared to prior methods.

adaptation, adapter network, algorithm, (13 more...)

arXiv.org Machine Learning

2101.0475

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Plymouth County > Hanover (0.04)
(2 more...)

Genre: Research Report (0.52)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback